IOP Conf. Series: Earth and Environmental Science 1151 (2023) 012049
In the context of the aviation industry, discovering data stories is one of the topics that has been
researched as there is a fragment in the ecosystem, or a gap between the data scientists and the domain
experts in the industry [5]. It is important for the industry to investigate and extract data stories of the
aviation schedule, as well as identify the pattern of flight operations to help the operators understand the
current situation. One of the most significant issues is flight delays, as it is a performance indicator of
any transportation system. Notably, commercial aviation defines delay as the amount of time an aircraft
is running late or delayed [6]. The U.S. Department of Transportation reported that flight problems were
the highest category of the complaints received in June 2022, with 1,686 (28.8%) concerned with
cancellations, delays, or other deviations from airlines’ schedules [7]. This demonstrates how important
it is to analyse the flight delay and its causes.
In his work, Vishwakarma et al. mentioned that delays cause effects leading from passengers’
monetary value to dissatisfaction [8]. He analysed and stated that flight delays are caused by many
diverse factors, ranging from adverse weather conditions, technical problems, restrictions in air-traffic,
and many more. They used big data technologies like Hadoop MapReduce, Apache Hive, Apache Pig,
Hadoop Distributed File System (HDFS), and MySQL in the analysis. Results show that the effect of
bad weather has a major impact on flight delays. However, there was no data visualisation output shown
in the study. Another study by Kumar et al. [9] proposed a method that addressed flight delay prediction
that applied computational and machine learning approaches. Besides, they also presented a timeline of
major works that depicts the relationship between the factors of the flight delay. His work used some
visuals, but not in a dashboard interactive mode. Richard et al. [10] work on an expert system that acts
as an intelligent agent to interrogate past aircraft occurrences using a Fuzzy Logic System (FLS). The
expert system is presented graphically and interactively using AcciMaps. However, not much visual has
been assessed and discussed in his work. Other works have been reviewed [11,12,13], but work on
aviation datasets and dashboard development is extremely limited. Thus, this research represents an
effort to investigate and present the process of analyzing, designing, and developing a dashboard with
visuals related to aviation operations monitoring. To demonstrate the findings, this study proposes a
prototype that has been developed. To meet that purpose, the study's objectives are as follows: (i) to
identify, select, and prepare relevant attributes for aviation delay monitoring and causes; (ii) to design
visual objects to facilitate the data stories of the attributes; and (iii) to assess the dashboard's formative
functionality and usability of the data story.
This paper is organised and starts with an introduction (I) that covers the study background,
problem's definition, objectives, and related work. In Section 2, methods and materials (M) are
presented, followed by Section 3 that presents the result (R) and discussion (D). Concluding remarks
are in the final section, where future work is proposed. Within the dataset, the objective of data stories
that are intended to be obtained are: (i) the trend of scheduled flights and actual flights based on monthly;
the highest flight distribution by states; and the busiest airline; (ii) the pattern of the on-time, delayed,
and cancelled status of flights and the flight's punctuality based on the duration of delays during flight
departure and flight arrival; and (iii) identify the most likely causes of flight delay and reasons for flight
cancellation.
2. Methods and Material
This study consists of four steps. The following Figure 1 shows the overall steps. The first step was to
obtain the data from relevant sources, where technical skills such as MySQL are necessary to analyse
the data. The second step involved data examination, transformation, and preparation duties, where data
scrubbing, filtering, and sampling have been done. Corresponding to that, scrubbing data also includes
suggesting values for the missing data as well as deriving new attributes. The third stage comes
afterwards by exploring the data to get an overview of each attribute pattern. Different data types, such
as numerical data, categorical data, ordinal and nominal data, will require different treatments. An
Explanatory Data Analysis (EDA) technique has been implemented iteratively in steps 2 and 3 as it
helps to identify the pattern of data in each independent attribute. The fourth stage is the modelling of
the data story in the form of visual representation and arranging each of the objects on a dashboard.